The February 2024 NVIDIA Generative AI on RTX Contest: https://www.nvidia.com/en-us/ai-data-science/generative-ai/rtx-developer-contest/ was a competition that challenged developers to create a LLM (Large Language Model) that works with and was built by the Tensor-Rt engine to optimize it for speed on inference. I also added some components that let it work on consumer computer gpu and cpu hardware.
I first got the model from github from a researcher's open project for offloading transformer layers and then I modified it with python code and trained it in google colab to work with and be optimized on a Tensor-Rt engine. I also added some safety features (such as using the '//' method to increase the importance of following identity and instructions for conversations).
The AI program is able to generate text in a faster pace on a consumer-grade computer and was trained with Tensor-RT engine on a consumer-grade computer with a quick inference rate. The Tensor-RT engine was not originally designed to work with the Mixtral model so I had to modify it.
Google Collab Files: https://github.com/viasky657/GoogleCollabFiles